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Abstract 

Financial data has been extensively studied for correlations using Pearson's 
cross-correlation coefficient p as the point of departure. We employ an esti- 
mator based on recurrence plots — the Correlation of Probability of Recur- 
rence (CPR) — to analyze connections between nine stock indices spread 
worldwide. We suggest a slight modification of the CPR approach in order 
to get more robust results. We examine trends in CPR for an approximately 
19-month window moved along the time series and compare them to p. Bin- 
ning CPR into three levels of connectedness: strong, moderate and weak, 
we extract the trends in number of connections in each bin over time. We 
also look at the behavior of CPR during the Dot-Com bubble by shifting 
the time series to align their peaks. CPR mainly uncovers that the markets 
move in and out of periods of strong connectivity erratically, instead of mov- 
ing monotonously towards increasing global connectivity. This is in contrast 
to p, which gives a picture of ever increasing correlation. CPR also exhibits 
that time shifted markets have high connectivity around the Dot-Com bubble 
of 2000. We stress on the importance of significance testing in interpreting 
measures applied to field data. CPR is more robust to significance testing. 
It has the additional advantages of being robust to noise, and reliable for 
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short time series lengths and low frequency of sampling. Further, it is more 
sensitive to changes than p as it captures correlations between the essential 
dynamics of the underlying systems. 

Keywords: correlation, stock indices, recurrence plots, econophysics 



1. Introduction 

Time series of stock markets give some insight into the rather macroscopic 
dynamics of the underlying systems. An important aspect is to study the in- 
terrelations among dynamical stock indices. However, to answer the question: 
"How do connections between stock markets change over time?" a measure of 
connectedness must first be arrived at. In earlier studies, the Pearson's cross- 
correlation coefficient p served as a proxy for 'links' between financial data 
sets [l-18|. Its extensive usage has made Pearson's p become synonymous 



with the notion of correlation, and thereby connections, itself. However, 
recent developments in nonlinear data analysis have suggested alternative 
approaches for estimating connections using, e.g., the study of recurrences. 
Recurrence Plot (RP), Cross Recurrence Plot (CRP), and Joint Recurrence 
Plot (JRP) [19]. In our study, we estimate connections between financial 
data sets from the recurrences of dynamical systems. We use the Correlation 
of Probability of Recurrence (CPR), which is based on RPs and was origi- 
nally devised to quantify phase synchronization between non-phase-coherent 



and non-stationary time series [19|, |20]. As the notion of synchronization is 



innately bound to those of connectedness and co-movement, we argue that 
CPR too can serve as a measure for connectedness. 

The potential of recurrences in analyzing financial data has been explored in 
several studies, e.g., in correlation analysis among currencies j21j], identifica- 



tion of nature of crashes 22 , in estimation of intial time of a bubble 23 and 



in quantifying the behaviour of global stock markets during financial crises 



2J]. Furthermore, cross recurrence analysis has been used to look at syn- 
chronicity and convergence of the GDP among member nations of the Euro 
region |25l] . 

The main objectives of this work are to re-examine commonly held notions of 
connectedness and explore the potential of using (a slightly modified) CPR 
as a measure of connectivity between stock indices. We use CPR to formu- 
late connectivity trends between stock indices over almost two decades and 
compare it to the trends given by p. We also underscore the importance of 
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using significance tests based on surrogate data sets while interpreting results 



an- 



obtained from field data. In particular, we apply Twin Surrogates [26[ 
other recurrence-based algorithm, for generating surrogates of our financial 
time series. 

We argue for CPR as a suitable measure for measuring connectivity, based 
on its fundamental nature, and its ability to extract information from rela- 
tively poor data sets. 

In Sec. |2] we outline the basic idea behind our study. Sec. |3] outlines the 
underlying theory and Sec. H] elaborates on the methods used to analyze the 
data. Sec. |5] states the main results and their implications. 

2. Measuring connectivity 

The common thread in connectivity studies of financial data has been co- 
movement^ and it has been referred to with varied terms like correlation [l|,[2|. 



synchronization [27| and cointegration [28|, |29|. Each of them incorporates a 
mathematical formulation that captures some aspect of co-movement for a 
time series pair. In measuring connectivity we look at another feature that 
can imply connectedness: similarity. If two data sets are similar in an aspect 
that we can measure, then they are closer to each other than, say, with a 
third data set with which neither of them share as much common ground. 
Similarity is a more general feature, one of whose manifestations might be 
co-movement. In our analysis based on a recurrence approach, we intend 
to take into account the real evolution typical for financial markets better 
than the classic correlation analysis. Therefore, we make the following basic 
assumptions: 

i. The time series we deal with are the output of black boxes, i.e., those 
systems whose dynamics and model equations are unknown. 

ii. The dynamics of such systems may change over time (non-stationarity). 

iii. The change in dynamical nature is itself a characteristic feature of the 
system. 

iv. The time series may have features that are common to all of them (e.g. 
power spectra and clustered volatilities) and further, these similarities 
are quantifiable. 
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V. The quantifiable features are representative of the underlying dynamical 
nature of the system. 

The quantifiable feature we study here is the probability of recurrence (see 
Sec. 13. 2p and the similarity is captured by the cross-correlation between the 
probabilities of recurrence of pairs of time series (see Sec. 13. 3p . 



3. Theory 

3.1. Recurrence Plots 

A Recurrence Plot (RP) is a visual tool that shows the recurrence patterns 
of a dynamical system |30|. A recurrence is defined as the return of the 
trajectory of a system to an earlier state. In practice, a recurrence is said 
to occur when the system returns to the neighborhood of an earlier point 
in the phase space. Mathematically, given a point Xj G of a trajectory 
xi,X2, ■ ■ ■ ,xn, the recurrence matrix R is estimated as: 

Rij{e) = e{e- \\ X, ~ Xj \\), j = 1, . . . , iV, (1) 

where N is the number of points, e is an appropriate threshold distance, B(-) 
is the Heaviside function (i.e., G(a) = if a < 0, and 1 if a > 0) and || ■ || is 
an appropriate norm. R is a matrix of Os and Is and an RP is a graphical 
representation of R obtained by, e.g., marking a black dot for every 1 and a 
white dot for every 0. 



RPs capture the essential features of a system [l9|, |30[ . RPs of three different 
types of data sets, viz., uniform white noise, chaotic Lorenz system, and 
daily financial data, are shown in Fig. [1] All three plots are distinct from 
each other and characteristic of the system. 
Next, we give some measures to characterize RPs. 

3.2. Probability of Recurrence 

The Probability of Recurrence p{t), also called the t -recurrence rate, is 
the recurrence rate of a diagonal line situated at r steps from the main 
diagonal, i.e., Rj Vi = 1, . . . , N — t [loj. It is a probabilistic measure that 
gives the probability of an [i + r)**^ point falling in the e-neighborhood of the 
i^'^ point: 



^ N-T 

Pi^) = E (2) 
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Figure 1: Recurrence plots for three different types of data, (a) Uniform white 
noise, (b) The Lorenz System with a — 10, p — 28 and (3 — 10/3. (c) Daily data from 
DAX. Here, i and j are the index values for the time series entries. 
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Figure 2: p{t) curves for the three different types of data given in Fig. [TJ (a) 

Uniform white noise, (b) The Lorenz System with cr = 10, p = 28 and (3 — 10/3. (c) Daily 
data from DAX. 



It can be considered as a generalized form of an autocorrelation function that 
statistically reflects the time scales of the system in which it tends to return 
to a previous configuration. For instance, the uniform white noise time series 
of Fig. [TJ^a) has almost the same probability of recurring to an earlier state 
for all values of r (Fig. |2]^a)), while the chaotic Lorenz system of Fig. [U^b) 
has periodic tendencies for high recurrences but with decreasing intensity 
(Fig. [2]^b)), and the probabilities of recurrence for the daily DAX data from 
Fig. Wic) decreases (without periodicities) with increase in r, indicating the 
chances of a drift in the data set (Fig. Et^c)). 
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3.3. Correlation of Probability of Recurrence 

The Correlation of Probability of Recurrence (CPR) is defined as the 
cross-correlation coefficient between the probabihties of recurrence of two 



trajectories x and y [19|, |20 



CPR={p,{T)pyiT)), (3) 

where (■) represents the expectation value and x is the series x normalized 
to zero mean and standard deviation one (henceforth, 'normalization' refers 
to this particular way of normalizing a time series). 

However, all the p(r) curves in Fig. |2] start from p{0) = 1, because the 
recurrence rate is always 1 at r = 0, the main diagonal. This initial portion 
of the p{t) curve, common to all trajectories, introduce a bias towards a high 
CPR value. To evaluate CPR correctly, we suggest to consider only p(r) for 
such r larger than the autocorrelation time Tc of the system (defined as the 
delay r at which the autocorrelation function of the system falls to 1/e): 

CPR = { Pg{T > Tc) Py{r > Te) ) , (4) 

where 

Tc = max { Tc{x), Tc{y) }. (5) 

This is shown in Fig. [3] which illustrates the steps involved in estimating 
CPR. The CPR between the two time series according to Eq. (jS]) is 0.892, 
whereas it is 0.575 according to Eqs. (jlj) and ([5]). 

CPR characterizes the degree of phase synchronization between two time 
series, with CPR ~ 1 implying that the two systems are phase synchronized 



19|. However, it can be interpreted more generally as a measure denoting the 
level to which two trajectories x and y have similar time scales of recurrence. 
In this study, this means that two financial time series with a high CPR 
tend to recur at similar times, suggesting some similarity in their underlying 
dynamics. 

3.4. Pearson correlation 

The Pearson correlation coefficient (p) is commonly used to analyze corre- 
lations between financial data. It is then used to define the distance between 
the data sets as given in [l| . It is simply the expectation value of the product 
of the normalized time series: 

ps,y = {xy), (6) 
where (■) and x are the same as before. 
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Figure 3: Calculating CPR from p{t). (a) Normalized daily data for 200 days from 
two stock indices, (b) p{t) curves for these time series. Tc = 13 is shown with a vertical 
line, (c) Normalized p{t) curves beyond Tc . (d) The product of the two series in (c). The 
horizontal line is the mean of this series, which is the CPR. 
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3.5. Significance testing and Twin Surrogates 

To use the CPR alone for interpretations might be misleading. In an 
active experiment (such as laboratory experiments or numerical simulations 
of model systems) where the parameters of the system can be controlled, the 
CPR for different parameter sets can be compared and thus, a consistent 
interpretation is possible. However, the financial time series we use are the 
only realizations of the black boxes generating them. In such a passive exper- 
iment, where the parameters of the system cannot be changed or controlled, 
or its dynamics are unknown, it is crucial to generate surrogate time series 
and check for the statistical significance of the observed CPR against the 
distribution of CPR obtained from the surrogate data sets. This is done 
using a statistical test and an appropriate null hypothesis. 
The Twin Surrogates (TS) algorithm is a recurrence-based method of gen- 
erating surrogate time series |26|. A pair of points Xi and Xj of a trajectory 
X (of length A^) are called twins if, for k = 1,2, . . . , N; R^ j = Rfej. This 
means that, barring their exact positions in the trajectory, these two points 
have the same neighborhood in phase space. The TS method is an iterative 
algorithm that involves: 

i. Identifying twins from the recurrence plot of the trajectory x. 

ii. Taking any arbitrary point xl G a; as the starting point of the surrogate 
trajectory s. 

iii. Iteratively adding subsequent points to s as: if xi E x is the previous 
point of s, and xi has no twins, then the next point of s is simply xi^i; 
whereas if xi has n — 1 twins, then the next point of s is any one of that 
particular set of n twins, chosen with equal probability. 

Although there are several methods to generate surrogate time series, it is 
natural to use TS in this study. In the widespread iterative Amplitude Ad- 
justed Fourier Transform (iAAFT) method, surrogates are created with the 
assumed null hypothesis that the observed time series is the result of a Gaus- 
sian process seen through a static linear filter whereas, in TS, each surrogate 
is an independent realization of the observed time series differing only in 
the initial conditions. iAAFT surrogates do not preserve nonlinear charac- 
teristics such as mutual information, whereas TS preserves linear as well as 



nonlinear properties of a system 26| . Fig. Hl^a) shows the original time series 



from DAX and both the surrogates via iAAFT and TS. Although the errors 
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Figure 4: Comparison of Twin Surrogates and the iAAFT methods, (a) Daily 
data for 1942 days from DAX (bold) (scaled to lie between and 1), and a realization 
of its corresponding Twin Surrogate (dashed) and iAAFT surrogate (dotted), (b) The 
autocorrelation functions of the three sets of data in (a), (c) The error in autocorrelation of 
the Twin Surrogate (dashed) and the iAAFT surrogate (dotted), (d) Mutual information 
of the data sets in (a) . (e) The error in mutual information of the Twin Surrogate and 
iAAFT surrogate. Keys for (c) and (e) are the same as in (b) and (d) respectively. 
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in autocorrelation for both methods are comparable, the iAAFT surrogate 
has much larger error in mutual information than the TS (Figs. Ill^b)-(e)). 
We use the TS to generate a test distribution using which we test the signifi- 
cance of the observed measure M (which can be either CPR or p). The null 
hypothesis for this significance test is that each surrogate is an independent 
trajectory of the same dynamical system which gave rise to the observed 
time series. This means that when we test for significance, we check for the 
probability that an independent realization of one of the time series (as given 
by its TS) can give a similar value of M. The steps involved in the test are: 

i. The value of M between time series A and B is estimated and designated 
as Mo (say). 

ii. TS are generated from the series B. 

iii. M is calculated between each surrogate and the series A. This gives the 
test distribution of M. We assume that this distribution is roughly nor- 
mal as each TS corresponds to a distinct trajectory of system B starting 
from an independent initial condition. 

iv. The mean and standard deviation a are estimated for this test distri- 
bution. 

V. The test statistic Z is then evaluated as: 



a 



(7) 



which can be used to infer the probability with which Z belongs to a 
standard normal distribution. 



A prefixed cut-off for the probability is decided below which M is said to be 
significantly different from the test distribution, and this is the significance 
level of the test. The probability value (or p-value) obtained from the stan- 
dard normal table for the test statistic Z represents the probability that M 
might actually be from the test distribution. In terms of the null hypothesis 
— that the two time series are independent — the p-value represents the 
probability that the null hypothesis cannot be rejected. 
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Table 1: Test data: Market indices and their locations 



Label Stock Index Location 



A 


CAC 40 


France 


B 


FTSE 100 


U.K. 


C 


DAX 


Germany 


D 


NASDAQ 


U.S.A 


E 


DJIA 


U.S.A 


F 


S&P 500 


U.S.A 


G 


Nikkei 225 


Japan 


H 


Hang Seng 


China 


I 


Strait Times 


Singapore 



4. Methods 

4.I. The data set 

The daily close values ranging from 3'"^ December 1990 to 30*^ April 2010 
of nine stock indices (given in Table [1]) from around the world were used 
in our analysis. Three were from Asia, three from Europe and three from 
the U.S.A. The data was obtained from |http : //f inance . yahoo . com/ , Each 
data set contained a numerical value (representing the close value of the index 
on a particular date) and its corresponding date. 

4-2. Preprocessing the data set 

The data sets had unequal lengths because of the unequal distribution of 
holidays for stock indices in different regions of the world. To align them tem- 
porally, mismatched dates (and the corresponding close values) were deleted, 
i.e., any date of a particular market not present in any one (or more) other 
market (s) of the remaining eight resulted in the deletion of that date (and 
the corresponding close value) from all the markets. Simply put, a common 
intersect of all the nine data sets, in terms of dates, was obtained, which was 
of length 4238 time points. 

Moreover, the index values of the different indices were arbitrary (Fig. [5|^a)). 
To make qualitative comparisons possible, they were normalized to mean 
zero and a standard deviation of one (Fig. EJ^b)). This enabled us to com- 
pare the scaled time series and all the recurrence-based measures obtained 
from them using the same value of the parameters, e.g., the same value of 
the recurrence threshold e. 
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Figure 5: Normalizing the data, (a) Daily close values for Dow Jones Industrial Average 
(DJIA) (bold)and Strait Times (dotted), (b) The time series in (a) after normalization 
(note the difference in the vertical axes in both figures). The left vertical axis is for DJIA 
while the right vertical axis is for Strait Times. 



12 



4-3. Selection of parameters: Preliminary analysis 

4- 3.1. Window size 

The primary method in this study is to shde a window of fixed width 
along the time series and then estimate CPR or p. However, because of the 
deletion of dates (see Sec. I4.2I) . if a window of, say, 500 consecutive time 
points is chosen, the actual range of days contained in it is larger than 500 
and further, this range changes as the window moves along the time series 
(Fig.[6](a)-(c)). In order to get an understanding of the variation in the range 
of days contained in a window with respect to the number of time points in 
the window, we varied the window size from 150 to 350 time points and esti- 
mated the standard deviation of the range of days contained in each window, 
and found that the standard deviation increases (almost) monotonically with 
increasing window size (Fig. [6]^d)). 

Ideally, we want a window with negligible standard deviation. However, this 
would mean reducing the window size which would reduce the effectiveness 
of the RP and, in turn, the measures estimated from it. As a compromise, 
we choose a window of 250 time points, which is partly arbitrary but we (rea- 
sonably) assume that the qualitative features of the results are not severely 
effected by increasing or decreasing the window size from 250 by a small 
margin. The mean range of days contained in a 250 time point window is 
approximately 416 days, which is roughly equal to 19 months (considering a 

5- day week). 

4.3.2. RP parameters 



There are several ways of constructing an RP from data p^, e.g., with 
a fixed threshold e or a varying one, with or without embedding. In our 
analysis, we do not embed the time series. Also, there is no fixed rule to 
select e. It depends on the objectives of the study and the nature of the 
data set. The choice of e should ensure that the recurrence matrix R rep- 
resents the dynamics of the system. It should neither be too large (to avoid 
counting spurious recurrences) nor be too small (to avoid excluding crucial 
recurrences). One rule of thumb is to ensure that e does not exceed 10% of 



the maximum distance between the points in the time series [31[. Another 
approach is to consider the recurrence-based measure as a signal detector 
and then choose the e value that yields the maximum power from the signal 



for the detector being considered [32 



In our study we find that keeping e ~ 10% of the maximum distance often 
results in a 'dense' RP (Fig. [Tl^a)). Therefore, to capture the finer recurrence 
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Figure 6: Selecting the window size, (a)-(c) The range of days contained in windows 
of sizes 150, 250 and 350 time points as they are moved along the time series in steps of 
10 time points. The horizontal dashed lines represented the mean number of days, (d) 
The standard deviation of the distribution of range of days contained in windows sized 
between 150 to 350 time points. 
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Figure 7: Recurrence plots for different thresholds, (a) RP for 1000 days of FTSE 
100, e — 10% of maximum distance, (b) RP obtained by normalizing the data in (a), and 
£ = 0.1, which is around 2% of the maximum distance. This gives a clearer visualization 
of the finer structure. For both RPs, the time series were not embedded. 

structures while still allowing for sufficient statistics in the CPR, we choose 
e = 0.1 as the threshold for all RPs and normalize the time series (or any 
segments thereof) before the recurrence-based calculations. This effectively 
reduces the recurrence threshold to about 2% of the maximum distance and 
gives a 'clearer' RP (Fig. El^b)). The RPs given in Fig. [8] were obtained with 
e = 0.1, where it corresponds to around 2-3% of the maximum distance. 
We find that the qualitative features of recurrences are robust to this choice 
of e. In Fig. |9l the p(r) curves for the Hang Seng index obtained from its RP 
(Fig. El^h)) are shown for e = 1%, 2% and 3% of the maximum distance. Al- 
though the magnitude of recurrences increases with increasing threshold, the 
qualitative nature of the curves remains the same throughout. The results 
obtained in this study are thus robust to small changes in e. 

4-. 4- Analyzing connectivity trends: Primary analysis 

First, a window of 250 time points was moved along pairs of time series 
and the CPR and p values were estimated. This was done for all 36 pairs 
possible between the nine time series. Next, the CPR values were binned 
in three categories: (a) strong connectedness {\CPR \ > 0.8), (b) moderate 
connectedness (0.5 < | CPR \ < 0.8), and (c) weak connectedness (| CPR \ < 
0.5), and the number of connections (out of the 36) in each bin were counted 
for every window. 
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Figure 8: Recurrence plots for the nine stock indices, (a) CAC 40. (b) FTSE 100. 
(c) DAX. (d) NASDAQ, (e) DJIA. (f) S&P 500. (g) Nikkei 225. (h) Hang Seng, (i) Strait 
Times. Recurrence threshold e = 0.1; RPs obtained without embedding. 
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Figure 9: p{t) curves for Hang Seng index, e — 1% (bottom curve), e = 2% (middle 
curve), and s = 3% (top curve) of the maximum distance. All three curves have that 
same pattern. Inset: The p(r) curves for r = 950 to t = 2050 to give a better view of the 
similar qualitative nature of the three curves. 



As a second step, the behavior of CPR during the Dot-Com bubble in 2000 
was considered, in which: (a) taking a pair of time series, the dates when 
they peaked during the Dot-Com were made to coincide, (b) a window of 
250 time points was moved from 500 time points before the peak to 250 time 
points after, and (c) the respective CPR (or p) was obtained. 
At every step, all CPR (or p) values were tested for significance with 100 
Twin Surrogates at 10% significance. 

5. Results and Discussion 

In this section, we present the main results obtained using the extended 
CPR approach and the steps described in Sec. 14. 4[ 

5.1. Trends in CPR 

Three points are evident from Fig. (which shows trends in CPR for 
six pairs of indices along with the corresponding p- values, viz., (a) CAC 40 
and FTSE 100, (b) DAX and NASDAQ, (c) DAX and Nikkei 225, (d) DJIA 
and S&P 500, (e) S&P 500 and Strait Times, and (f) Nikkei 225 and Hang 
Seng) . 

i. The CPR does not have a monotonous trend over time. Rather it os- 
cillates erratically with small periods of low CPR interspersing broader 
bands of high CPR. 
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ii. The regions of low CPR values have high p- values (and vice versa), 
meaning that the lower ranges of CPR tend to be less statistically sig- 
nificant in comparison to the higher values. 



iii. These patterns in the CPR are same for all index pairs, of which only 
six are shown in Fig. HUl 

Thus, if CPR were to be interpreted as a measure of similarity, and hence 
connectedness, it means that the financial world is not moving monotonously 
to a globally connected scenario, but is rather oscillating between long peri- 
ods of strong connectedness and short spans of low connectivity. (Nothing 
conclusive can be said about the nature of connections in the short periods 
of low CPR as the estimates therein are not statistically significant.) In a 
sense, what it says is that the more things change the more they remain the 
same. 



5.2. Patterns of connectivity 

The patterns in the number of connections in each of the three bins, 
summed up among all pairs of indices, reinforce the picture put forth in 
Fig. [TOl This is shown in Fig. [11] The strong and moderate connections are 
always statistically significant except for a few instances. The weak connec- 
tions are clearly more prone to be statistically non-significant. Again, we 
see that there is no global trend in Fig. [TTl i.e., there is no indication for an 
increasing global connectivity. Instead, we see that the number of strong and 
moderate connections oscillate erratically, implying that these nine indices 
come close to each other, and then move apart, and then come close again, 
and so on. Had all of them moved to higher connectivity over time, the num- 
ber of connections in the strong category would have increased progressively, 
and the numbers in the other two bins would have gone down with it. 

The strong and moderate connections move (loosely) in phase with each 
other (see Fig. [T2|) . Starting from the dip in Jan-1999, consecutive dips oc- 
cur roughly at Dec-2000, Jul-2002, Sep-2003, Feb-2005, Sep-2006, Feb-2008, 
and Oct-2009, meaning that the periods of these dips lie between 14 to 20 



months. This might be indicative of the Kitchin business cycle [33 . 
Also, the number of 'weak' connections (Fig. ITTT c)) is anti-phase to the num- 
ber of 'strong' and 'moderate' connections (Figs. [T]T a)-(b) and Fig. [12]), be- 
cause the total number of connections has to be conserved while the number 
in each of the three levels of connectedness go through cyclical patterns. 



19 



(a) 30 




Jun-94 Sep-96 Jan-99 May-01 Aug-03 Nov-05 Feb-08 




Jun-94 Sep-96 Jan-99 May-01 Aug-03 Nov-05 Feb-08 

(C) 40 I ' ' ' ' ' ' • 




Year 

Figure 11: Patterns of connectivity. The number of connections in each bin along 
time, (a) Strong connections, (b) Moderate connections, (c) Weak connections. The bold 
curves represent the number by counting statistically significant CPR values alone, while 
the dotted curves represent the number by counting all CPR values in each respective 
bin. Window size = 250 time points. Step size = 10 time points. 
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Figure 12: Strong and moderate connections. Strong (solid) and moderate (dotted) 
connections (from Fig. [11] (a) and (b)) shown together. Only statistically significant CPR 
values were counted. 

5.3. CPR in the Dot-Corn bubble 

Figs. [13] and [H] give an insight into the way stock indices approach a crisis 
and then recede from it. Even though the indices peaked and crashed on dis- 
tinct days (sometimes months apart), once the time series is shifted to ahgn 
the peak-off dates (Fig. [TST a)). the CPR increases around the peak-off date 
and decreases thereafter. For CAC 40 and Strait Times, the decrease in CPR 
starts almost at the peak-off date (Fig. [TST b)). On the other hand, for DAX 
and NASDAQ it occurs after the peak-off date (Fig.[Tl](a)) and for NASDAQ 
and DJIA it happens before (Fig. [T^b)). The increase- and- decrease of CPR 
is common to all of them. Thus, the probabilities of recurrence have strong 
correlation around the peak, i.e., irrespective of the actual date on which 
a particular index may peak, all indices approach and recede from a crisis 
similarly. 

Pearson's p also captures this behavior, but CPR is a more sensitive mea- 
sure, as p does not change as sharply as CPR around the peak-off date. Also, 
the CPR values tend to pass the statistical significance test more often than 
p (see Fig. [TSTb) and Fig. [H]). This point is further elaborated in Sec. 15.51 

5.4- Trends in Pearson's p 

The trend in Pearson's p is strongly different from CPR, as shown for 
S&P 500 and Strait Times in Fig. [15] (which is the same pair as in Fig. [Tor e)). 
Fig. [15] shows an overall movement towards higher correlation as time pro- 
gresses, in contrast to the oscillating pattern of Fig. [TOT e). The question then 
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Figure 13: CPR in the Dot-Com bubble (around the year 2000). (a) The series 
CAC 40 (bold) and Strait Times (dotted) with their peak-off date during the bubble 
coinciding. The vertical dotted line marks the peak-off date, (b) Corresponding CPR 
(circles) and Pearson's p (squares) values. Statistically significant and non-significant 
values are represented by filled and empty markers respectively. 
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Figure 14: CPR in the Dot-Com bubble, (a) DAX vs. NASDAQ, (b) NASDAQ vs. 
DJIA. Figure keys and markers are the same as in Fig. [131 
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Figure 15: Trends in Pearson's p. Pearson's p (bold curve) and its corresponding p- 
values (dotted curve) for the pair S&P 500 and Strait Times. The horizontal dotted line 
denotes the test significance level, p = 0.1. 
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Figure 16: Patterns of significance, (a) CPR, and (b) Pearson's p, along with p- values. 



arises: which is the correct picture ? Here, it is crucial to emphasize that these 
two measures capture different aspects of the time series. While p measures 
the tendency of the time series values to move together in one direction (or 
opposite directions), CPR measures the tendency of the time series values 
to return to earlier values at similar time scales. Hence, it is expected that 
their results are different. However, the case for CPR can be interpreted as 
follows: being based on recurrence rates it captures the essential dynamical 
nature of the system. This is in contrast to Pearson's p which is simply a 
statistical comparison of co-evolution of states. Pearson's p is less sensitive 
to changes in the time series as pointed out earlier. Thus, while it may be 
true that stock indices tend to show more co-movement towards the latter 
half of our data sets, this does not necessarily imply that they will continue 
to do so as the corresponding system dynamics move in and out of strong 
periods of correlation (see Sec. I5.ip . 

5.5. Advantages of CPR 

We highlight the following advantages of CPR as a measure for estimating 
connections between financial data sets. 

i. It is able to extract patterns even from noisy data sets, as is the case 
with most financial data. Moreover, it does not require that the data 
are distributed normally, as is required for the proper usage of p. 

ii. The data sets need not be necessarily embedded for its estimation. 
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iii. It can be estimated for short time series as well, as is done in the current 
work. 

iv. It does not require high-frequency sampling of the data, which is common 
in most financial analyses. Our analysis was done with daily sampled 
data which was freely available on the internet. 

V. It tends to have lower p- values than p, as seen from the spread of points 
in the top-left corners of the plots in Fig. [161 Most CPR values are 
grouped close to the p = axis, whereas the p values have a broader 
spread. This imples that the probability of rejecting the null hypothesis 
tends to be lower with CPR. 

6. Conclusion 

In summary, we present a new way of looking at connections between 
different stock markets. This perspective, involving recurrences and CPR, 
uncovers that, over the past two decades, markets have not proceeded to- 
wards an ever increasing connected state but rather moved in and out of 
strongly connected periods. It also points out that stock indices share simi- 
lar dynamics during a bubble. 

Moreover, we highlight the importance of significance testing, with the help 
of Twin Surrogates, in interpreting measures that analyze field data. In this 
respect, CPR proves to be a robust measure and moreover, one having the 
power to reveal patterns from relatively poor data sets — in terms of noise, 
low frequency of sampling, and short time series length — even. 
Lastly, we suggest a slight modification of the CPR, which removes the prob- 
lem of overestimation of coupling by it, thereby extending its capacity as a 
measure that can estimate connections between time series data effectively. 
This opens up newer possibilities of analyzing couplings between financial 
time series using (cross) recurrence analysis in a better manner. 
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